-
Notifications
You must be signed in to change notification settings - Fork 3.6k
feat(fabric): introduce process-safe port management #21313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(fabric): introduce process-safe port management #21313
Conversation
219bc75 to
c48259e
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #21313 +/- ##
========================================
- Coverage 87% 87% -1%
========================================
Files 270 272 +2
Lines 23822 24150 +328
========================================
+ Hits 20795 20950 +155
- Misses 3027 3200 +173 |
1504dd1 to
252396e
Compare
|
@littlebullGit PR is looking good, can you please check the failing check and see if it just a random failure (it seemed to work before merging master) |
@SkafteNicki , the error seems not related. It failed to start a server within 20 sec. Maybe just try again. FAILED serve/test_servable_module_validator.py::test_servable_module_validator - Exception: The server didn't start within 20 seconds. |
cc125d4 to
3612cc4
Compare
relaxed the timeout check and retried the build. All passed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @littlebullGit for the work on this. The intent to address the port-collision issue was spot-on. Really appreciate the deep effort here.
After reviewing the failures, it turned out the core problem was isolated to standalone tests not acquiring a free port. Since they're not part of the distributed runner workflow, a simpler fix in the standalone test setup resolves the issue without the additional infrastructure.
Proposed fix here: #21335
Would love your review and thoughts when you get a moment.
Thanks again for pushing on this and exploring the problem space so thoroughly. 🙌🏻⚡️
What does this PR do?
Addresses comment: #21309 (comment)
Summary
📚 Documentation preview 📚: https://pytorch-lightning--21313.org.readthedocs.build/en/21313/